## Case 1: training on whole population

Match AA and WA within each site. Integrate all sites into 10 folds based on the number of matched AA-WA pairs. Select 3/10 folds as test set, the remaining 7/10 folds as training set (in total 120 variations). Generate suitable output `.mat` files to facilitate later-on kernel ridge regression requirement.

Top-level script: `ABCD_match_and_split.m`. It calls the other scripts in order: `ABCD_match_WAtoAA_within_site.m`, `ABCD_stats_MatchDiff_AAvsWA.m`, `ABCD_split_folds.m`, `ABCD_split_unselected.m`. Exemplar usage of `ABCD_match_and_split.m` (matlab):

```matlab
ABCD_match_and_split('/path/to/lists/phenotypes_pass_rs.txt', ...
    '/path/to/lists/subjects_pass_rs_pass_pheno.txt', ...
    'race', 'site', 'family_id', '/path/to/lists/confounds_list.txt', ...
    '/path/to/lists/behavior_list.txt', 100, 2.45, '/output/dir/', ...
    '_pass_rs_pass_pheno')
```

`'/path/to/lists/phenotypes_pass_rs.txt'` is the csv file generated by `../preparation/ABCD_read_all_measures.m`. `'/path/to/lists/subjects_pass_rs_pass_pheno.txt'` is the list of subjects who passed all quality controls and had all required phenotypes, generated by `../preparation/ABCD_read_all_measures.m`. `'/path/to/lists/confounds_list.txt'` is a list specified by the user containing the confounding variables to be matched. `'/path/to/lists/behavior_list.txt'` is the list of behavioral names generated by `../preparation/ABCD_read_all_measures.m`. `100` and `2.45` are the parameters used for matching. They were picked to reach lower enough matching cost and keep as many as matched pairs. `'_pass_rs_pass_pheno'` is a string attached to the output files, specified by the user.

## Case 2: training on all AA and WA with the same sample size

This step relys on the folds generated in `Case 1`. For each fold, all AA (matched & unmatched) in this fold are selected. Same number of WA are also selected. In rare cases, the number of WA could be smaller than the number of AA for certain sites. In that case, AA are randomly selected to match the number of WA for these sites. Use `ABCD_select_allAA_randWA.m` to execute this procedure.

```matlab
ABCD_select_allAA_randWA('/path/to/lists/phenotypes_pass_rs.txt', ...
    'race', 'site', '/path/to/lists/behavior_list.txt', ,,,
    '/path/to/lists/subjects_pass_rs_pass_pheno.txt', '/path/to/split/folds/', ...
    '_pass_rs_pass_pheno')
```

`'/path/to/lists/phenotypes_pass_rs.txt'` is the csv file generated by `../preparation/ABCD_read_all_measures.m`. `'/path/to/lists/behavior_list.txt'` is the list of behavioral names generated by `../preparation/ABCD_read_all_measures.m`. `'/path/to/lists/subjects_pass_rs_pass_pheno.txt'` is the list of subjects who passed all quality controls and had all required phenotypes, generated by `../preparation/ABCD_read_all_measures.m`. `'_pass_rs_pass_pheno'` is a string attached to the output files, specified by the user, which needs to be consistent with the string used for `ABCD_match_and_split.m`.

Save out the split folds files which meet the requirement of CBIG kernel ridge regression package:

```matlab
ABCD_create_subfold_allAA_randWA('/path/to/lists/subjects_pass_rs_pass_pheno.txt', ...
    '/path/to/lists/behavior_list.txt', '/path/to/split/folds/', '_pass_rs_pass_pheno')
```

## Case 3: training solely on AA

This step relys on the output files of `ABCD_select_allAA_randWA.m`. It extract the AA selected in `Case 2` and save them into `.mat` files which meet the requirment of CBIG kernel ridge regression package. Use `ABCD_create_subfold_allAA.m` to execute this procedure:

```matlab
ABCD_create_subfold_allAA('/path/to/lists/subjects_pass_rs_pass_pheno.txt', ...
    '/path/to/lists/behavior_list.txt', '/path/to/split/folds/', '_pass_rs_pass_pheno')
```

Exemplar input arguments are the same as introduced in `Case 2`.

## Case 4: training solely on WA

This step relys on the output files of `ABCD_select_allAA_randWA.m`. It extract the WA selected in `Case 2` and save them into `.mat` files which meet the requirment of CBIG kernel ridge regression package. Use `ABCD_create_subfold_randWA.m` to execute this procedure:

```matlab
ABCD_create_subfold_randWA('/path/to/lists/subjects_pass_rs_pass_pheno.txt', ...
    '/path/to/lists/behavior_list.txt', '/path/to/split/folds/', '_pass_rs_pass_pheno')
```

Exemplar input arguments are the same as introduced in `Case 2`.

## Post-hoc analyses (sub-analyses in the manuscript)

### Compare behavioral variances between matched AA and WA

Check if the difference in behavioral scores were significantly different between matched AA and WA using Levene's test.

Example (matlab):
```matlab
ABCD_pheno_var_AAvsWA_matched(...
    '/path/to/matched_AA_WA/mat/files/sel_AAWA_pass_rs_pass_pheno.mat', ...
    '/path/to/lists/subjects_pass_rs_pass_pheno.txt', ...
    '/path/to/lists/phenotypes_pass_rs.txt', '/output/filename.mat')
```

`'/path/to/matched_AA_WA/mat/files/sel_AAWA_pass_rs_pass_pheno.mat'` is the output file of `ABCD_match_and_split.m`. It contains the selected matched pairs of AA and WA IDs and the corresponded confounding and behavioral scores. `'/path/to/lists/subjects_pass_rs_pass_pheno.txt'` is the list of subjects who passed all quality controls and had all required phenotypes, generated by `../preparation/ABCD_read_all_measures.m`. `'/path/to/lists/phenotypes_pass_rs.txt'` is the csv file generated by `../preparation/ABCD_read_all_measures.m`.
